Technical Field
The present invention relates to a term division method. In particular, the present invention relates to a method for dividing a term with appropriate granularity.
Background Art
Compound nouns, each of which is constituted by multiple words (e.g., two to six words), are endlessly created by connecting multiple nouns or affixes corresponding to nouns.
A glossary, created in development of a system, includes the compound nouns described above. However, the meanings of the compound nouns are often unknown when the compound nouns are seen at a glance.
Especially, a glossary for a system of a financial institution may include compound nouns, such as (a compound noun constituted by kanji characters) and (a compound noun constituted by kanji characters).
In the case of English language, a noun phrase is formed by connecting multiple words. For example, “beneficiary right seller's business security deposit” and “financial instruments intermediary service” are given.
Techniques for dividing a compound noun by applying a morphological analysis technique to the compound noun are known (See, e.g., Non Patent Literature 1 and 2 below). In the morphological analysis technique, however, since a compound noun is divided on the basis of a system dictionary and grammar held by a morphological analyzer, a desirable result is not necessarily obtained.
For example, when the morphological analyzer divides (a compound noun constituted by kanji characters) described above using the morphological analysis technique, it is divided in words like (words constituted by kanji characters). Furthermore, if the morphological analyzer divides (a compound noun constituted by kanji characters) described above using the morphological analysis technique, which is originally one word (that is, one word which is an abbreviated word of (written in kanji characters), is divided in kanji characters like (words constituted by kanji characters).
When the morphological analyzer divides “business security deposit” described above using the morphological analysis technique, it is difficult to judge whether to divide it as “business security” +“deposit” or “business” +“security deposit” or to leave it as “business security deposit” without dividing it.
Patent Literature 1 to 10 below describes analysis of a sentence or extraction of a keyword.
Patent Literature 1 JP2007-257390A
Patent Literature 2 JP10-207890A
Patent Literature 3 JP07-85101A
Patent Literature 4 JP2007-264718A
Patent Literature 5 JP08-305695A
Patent Literature 6 JP2001-325284A
Patent Literature 7 JP2010-204866A
Patent Literature 8 JP2011-96245A
Patent Literature 9 JP2008-140359A
Patent Literature 10 JP2012-234512A
Non Patent Literature
Non Patent Literature 1 “Structure Analyzing of Japanese Compound Noun Using Rules and Corpus” by Satoru Ohta et al.; Proceedings of the Third Annual Meeting of the Association for Natural Language Processing, pp. 313-316, March 1997; available from <URL: http://www.anlp.jp/proceedings/annual_meeting/2003/pdf_dir/C6-2.pdf>
Non Patent Literature 2 “Japanese Compound Noun Analysis Using Structuring Rules” by Mitsuhiko Takahashi et al.; Proceedings of the Ninth Annual Meeting of the Association for Natural Language Processing, pp. 541-544, March 2003; available from <URL: http://www.anlp.jp/proceedings/annual_meeting/2003/pdf_dir/C6-2.pdf>
Non Patent Literature 3 “Japanese Compound Noun Structure Analyzer Using Structured Chart Parser” by Masahiro Miyazaki et al.; The Association for Natural Language Processing, 2008; available from <URL: http://www.languetech.co.jp/out/08nlp-miyazaki.pdf>